Medical Document Clustering Using Ontology-Based Term Similarity Measures

نویسندگان

  • Xiaodan Zhang
  • Liping Jing
  • Xiaohua Hu
  • Michael K. Ng
  • Jiali Xia
  • Xiaohua Zhou
چکیده

Recent research shows that ontology as background knowledge can improve document clustering quality with its concept hierarchy knowledge. Previous studies take term semantic similarity as an important measure to incorporate domain knowledge into clustering process such as clustering initialization and term re-weighting. However, not many studies have been focused on how different types of term similarity measures affect the clustering performance for a certain domain. In this article, we conduct a comparative study on how different term semantic similarity measures including path-based, informationcontent-based and feature-based similarity measure affect document clustering. Term re-weighting of document vector is an important method to integrate domain ontology to clustering process. In detail, the weight of a term is augmented by the weights of its co-occurred concepts. Spherical k-means are used for evaluate document vector re-weighting on two real-world datasets: Disease10 and OHSUMED23. Experimental results on nine different semantic measures have shown that: (1) there is no certain type of similarity measures that significantly outperforms the others; (2) Several similarity measures have rather more stable performance than the others; (3) term re-weighting has positive effects on medical document clustering, but might not be significant when documents are short of terms. Xiaodan Zhang Drexel University, USA Liping Jing The University of Hong Kong, China Xiaohua Hu Drexel University, USA Michael Ng Hong Kong Baptist University, China Jiali Xia Jiangxi University of Finance and Economics, China Xiaohua Zhou Drexel University, USA Medical Document Clustering Using Ontology-Based Term Similarity Measures

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Comparative Study of Ontology Based Term Similarity Measures on PubMed Document Clustering

Recent research shows that ontology as background knowledge can improve document clustering quality with its concept hierarchy knowledge. Previous studies take term semantic similarity as an important measure to incorporate domain knowledge into clustering process such as clustering initialization and term re-weighting. However, not many studies have been focused on how different types of term ...

متن کامل

خوشه‌بندی اسناد مبتنی بر آنتولوژی و رویکرد فازی

Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...

متن کامل

Web Service Clustering using a Hybrid Term-Similarity Measure with Ontology Learning

Clustering Web services into functionally similar clusters is a very efficient approach to service discovery. A principal issue for clustering is computing the semantic similarity between services. Current approaches use similarity-distance measurement methods such as keyword, information-retrieval or ontology based methods. These approaches have problems that include discovering semantic chara...

متن کامل

A Semantic approach for Text Clustering using WordNet based on Multi-Objective Genetic Algorithms

In this paper, we propose a method of MultiObjective Genetic Algorithms (MOGAs), NSGA-II and SPEA2, for document clustering with semantic similarity measures based on WordNet. The MOGAs showed a high performance compared to other clustering algorithms. The main problem in the application of MOGAs for document clustering in the Vector Space Model (VSM) is that it ignores relationships between im...

متن کامل

Semantic Search: Document Ranking and Clustering Using Computer Science Ontology and N-Grams

Semantic similarity has become an important tool and widely been used to solve traditional Information Retrieval problems. This study adopts ontology of computer science and proposes an ontology indexing weight based on Wu and Palmer’s edge counting measure and uses the N-grams method for computing a family of word similarity. The study also compares the subsumption weight between Hliaoutakis a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008